Expoloring data


Filtering counts

Figure 1: Size factors of initial oyster samples in the study, calculated using oyster gene count data.

After viewing the size factors, it appears that we have 5 samples with really low values that should be removed:

  • 19355
  • 19372
  • 19382
  • 19496
  • 19505


Figure 2: Size factors of oyster samples remaining in the study after 5 outliers were removed, calculated using oyster gene count data.

Without filtering low reads, we had a total of 38828 counts. After filtering out the low counts (those with a base mean less than 3), we now have 24384 counts remaining for all C. virginica samples.



PCA plots of Global Gene Expression

# going to use the filter_counts_out object created above because it is filtered for low reads and has outliers removed

## create the DESeq object
dds <- DESeqDataSetFromMatrix(countData = filter_counts_out, 
                              colData = expDesign,
                              design = ~ treat)
dds <- DESeq(dds) # differential expression analysis on gamma-poisson distribution
vsd <- varianceStabilizingTransformation(dds, blind = TRUE) # quickly estimate dispersion trend and apply a variance stabilizing transformation

## saving the rlog for DEG heatmap
rlog <- DESeq2::rlog(dds, blind = TRUE) #for use later on


## Save dds and vsd data 
save(dds, vsd, rlog, file = "Data/transformed_counts.RData")
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 1500
## 
## adonis2(formula = pcaData[6:31] ~ pcaData$infect * pcaData$pCO2, permutations = 1500, method = "eu")
##                             Df SumOfSqs      R2      F Pr(>F)
## pcaData$infect               1   1118.3 0.04736 1.1893 0.1799
## pcaData$pCO2                 1    945.3 0.04003 1.0053 0.3817
## pcaData$infect:pcaData$pCO2  1    864.0 0.03659 0.9188 0.5623
## Residual                    22  20686.8 0.87603              
## Total                       25  23614.3 1.00000


Figure 3: PCA plot with each point being one oyster sample colored by treatment. Light blue represents the control treatment (no infection, pCO2 level 400), dark blue represents no infection but high pCO2, dark red represents samples infected with boring sponge at pCO2 2800, and orange represents samples infected with boring sponge at pCO2 400.



Differential gene expression


DEGs

## log2 fold change (MLE): treat S 400 vs N 2800 
## Wald test p-value: treat S 400 vs N 2800 
## DataFrame with 24384 rows and 6 columns
##               baseMean log2FoldChange     lfcSE       stat    pvalue      padj
##              <numeric>      <numeric> <numeric>  <numeric> <numeric> <numeric>
## LOC111126949  3.452259     -0.9422247  0.817784 -1.1521686  0.249252  0.999652
## LOC111110729  0.485466     -0.5124481  1.463646 -0.3501175  0.726251  0.999652
## LOC111120752  1.693621     -0.2827063  0.846812 -0.3338478  0.738494  0.999652
## LOC111113860  5.933380     -0.0448271  0.601266 -0.0745544  0.940569  0.999652
## LOC111109550  0.331392     -0.9860539  2.483653 -0.3970176  0.691355  0.999652
## ...                ...            ...       ...        ...       ...       ...
## LOC111117112 0.6463025      1.6788746   1.80337  0.9309629  0.351873  0.999652
## LOC111117113 0.1436384      0.7681374   3.83879  0.2000987  0.841403  0.999652
## LOC111117117 0.0778208      0.2872407   4.24705  0.0676329  0.946078  0.999652
## LOC111116908 0.1097457     -0.0817811   4.24580 -0.0192617  0.984632  0.999652
## LOC111117715 0.6210268     -0.4455042   1.67997 -0.2651862  0.790866  0.999652
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 2, 0.0082%
## LFC < 0 (down)     : 0, 0%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 3, 0.012%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
summary(res_con_sponge)
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 0, 0%
## LFC < 0 (down)     : 0, 0%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 3, 0.012%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
summary(res_con_pco2)
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 0, 0%
## LFC < 0 (down)     : 0, 0%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 3, 0.012%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
summary(res_trt)
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 6, 0.025%
## LFC < 0 (down)     : 5, 0.021%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 11339, 47%
## (mean count < 2)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
summary(res_trt_pco2)
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 3, 0.012%
## LFC < 0 (down)     : 5, 0.021%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 14166, 58%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
summary(res_stn_pco2)
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 2, 0.0082%
## LFC < 0 (down)     : 0, 0%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 3, 0.012%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
summary(res_shn_pco2)
## 
## out of 24380 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 1, 0.0041%
## LFC < 0 (down)     : 3, 0.012%
## outliers [1]       : 36, 0.15%
## low counts [2]     : 3, 0.012%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
## [1] "N_400, S_2800"

Figure: Bar plot of significant DEGs. The y axis denotes how many genes there are, with positive values being up-regulated genes, and negative values denoting down-regulated genes.



Paired Venns of DEGs

Figure 4: Venn diagrams depicting number of shared significantly up-regulated / down-regulated genes. Orange venn diagrams depict up-regulated genes, while blue depicts down-regulated genes.



Full Venns of DEGs

Need to add all treatments with DEGs to the below code (ANGELA)

Figure 5: left: four-way venn diagram showing relationships among up-regulated genes between different treatment comparisons. right: venn diagram showing relationships among down-regulated genes between treatment comparisons with significantly down-regulated genes.



Gene expression plasticity



ANOVA Tests on Plasticity Data

ANOVA test on plasticity data with 2 PCAs (looking at variable treat to see if there is any significant difference of dist)

##             Df Sum Sq Mean Sq F value Pr(>F)
## treat        2  348.9  174.44   1.835   0.19
## Residuals   17 1616.2   95.07
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist ~ treat, data = ge_plast_2)
## 
## $treat
##                    diff        lwr      upr     p adj
## S_2800-N_2800  9.903288  -3.466771 23.27335 0.1689911
## S_400-N_2800   6.107834  -7.808165 20.02383 0.5117339
## S_400-S_2800  -3.795455 -17.711454 10.12054 0.7668884

ANOVA test on plasticity data with all PCAs (looking at variable treat to see if there is any significant difference of dist)

##             Df Sum Sq Mean Sq F value Pr(>F)
## treat        2   89.9   44.96   1.967   0.17
## Residuals   17  388.6   22.86
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist ~ treat, data = ge_plast_all)
## 
## $treat
##                    diff       lwr       upr     p adj
## S_2800-N_2800  4.993713 -1.562429 11.549855 0.1542649
## S_400-N_2800   3.289942 -3.533907 10.113791 0.4485889
## S_400-S_2800  -1.703771 -8.527620  5.120078 0.8000952



ANOVA test on plasticity data with all PCAs (looking at variable pCO2 * infect to see if there is any significant difference of dist)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## pCO2         1    2.6    2.64   0.116 0.7381  
## infect       1   87.3   87.28   3.818 0.0674 .
## Residuals   17  388.6   22.86                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Call:
##    aov(formula = dist ~ pCO2 * infect, data = ge_plast_all)
## 
## Terms:
##                     pCO2   infect Residuals
## Sum of Squares    2.6417  87.2801  388.6136
## Deg. of Freedom        1        1        17
## 
## Residual standard error: 4.781174
## 1 out of 4 effects not estimable
## Estimated effects may be unbalanced
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist ~ pCO2 * infect, data = ge_plast_all)
## 
## $pCO2
##                diff      lwr     upr     p adj
## 2800-400 -0.7930852 -5.71523 4.12906 0.7380613
## 
## $infect
##         diff       lwr      upr     p adj
## S-N 3.841318 -0.887726 8.570362 0.1047456



ANOVA test on plasticity data with all PCAs (looking at variable pCO2 * infect to see if there is any significant difference of dist)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## pCO2         1    5.6     5.6   0.059 0.8109  
## infect       1  343.3   343.3   3.611 0.0745 .
## Residuals   17 1616.2    95.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Call:
##    aov(formula = dist ~ pCO2 * infect, data = ge_plast_2)
## 
## Terms:
##                      pCO2    infect Residuals
## Sum of Squares     5.6144  343.2629 1616.1736
## Deg. of Freedom         1         1        17
## 
## Residual standard error: 9.750335
## 1 out of 4 effects not estimable
## Estimated effects may be unbalanced
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist ~ pCO2 * infect, data = ge_plast_2)
## 
## $pCO2
##               diff       lwr     upr     p adj
## 2800-400 -1.156189 -11.19401 8.88163 0.8109013
## 
## $infect
##         diff       lwr      upr     p adj
## S-N 7.617914 -2.026111 17.26194 0.1139175



ANOVA test on plasticity data with all PCAs (looking at variable treat to see if there is any significant difference of dist)

##             Df Sum Sq Mean Sq F value Pr(>F)
## treat        2   89.9   44.96   1.967   0.17
## Residuals   17  388.6   22.86
## Call:
##    aov(formula = dist ~ treat, data = ge_plast_all)
## 
## Terms:
##                    treat Residuals
## Sum of Squares   89.9218  388.6136
## Deg. of Freedom        2        17
## 
## Residual standard error: 4.781174
## Estimated effects may be unbalanced
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist ~ treat, data = ge_plast_all)
## 
## $treat
##                    diff       lwr       upr     p adj
## S_2800-N_2800  4.993713 -1.562429 11.549855 0.1542649
## S_400-N_2800   3.289942 -3.533907 10.113791 0.4485889
## S_400-S_2800  -1.703771 -8.527620  5.120078 0.8000952



Plasticity Plots

Figure 6: Assessment of plasticity (A-D) calculated as the average of the distances of each sample to the mean eigenvalue of the control (N_400). The control was omitted from this analysis. A) and C) pertains to analysis of all PCAs, while B) and D) contain analysis of 2 PCAs. A-B depicts box plots of plasticity and each individual sample as a point, while C-D shows the mean and standard error.

Results

Infected oyster samples in the 2800 pCO2 treatment exhibited the highest plasticity (Figure __), but results from the ANOVA test with all PCAs suggest this was not statistically significant (p = 0.17, F = .967). Oysters not infected with sponge and in the 2800 treatment had the lowest plasticity (Figure _).



GO for DESeq

Figure SX

All significant GO terms identified from DEGs:

BP_DEG_tree:BP_DEG_tree CC_DEG_tree:CC_DEG_tree MF_DEG_tree:MF_DEG_tree

Example of a heatmap for a GO term

Trying Dan’s way…

All the other heatmaps!

##                 19086    19103    19264    19370    19516    19642    19158
## LOC111126949 3.899572 2.929919 3.378275 2.708226 3.937815 2.859799 2.155749
## LOC111110729 2.849025 2.929919 2.155749 2.708226 2.155749 2.155749 2.692885
## LOC111120752 3.652839 3.238205 3.032737 3.101499 3.173819 2.859799 3.212457
## LOC111113860 3.782714 2.929919 3.930798 4.256921 2.751561 3.525798 3.848713
## LOC111109550 2.155749 2.155749 2.780571 2.155749 2.155749 2.155749 2.155749
## LOC111109753 4.006094 2.929919 3.839633 2.932439 4.086951 2.155749 2.155749
##                 19254    19274    19304    19467    19472    19549
## LOC111126949 2.155749 4.063514 3.394909 3.580107 3.436147 3.976749
## LOC111110729 2.857209 2.155749 2.155749 2.155749 2.155749 2.939827
## LOC111120752 2.857209 2.155749 3.236722 2.999821 2.981083 2.155749
## LOC111113860 3.800426 3.374452 3.045007 3.780792 4.130857 3.110394
## LOC111109550 2.155749 3.160253 2.155749 2.756791 2.155749 2.155749
## LOC111109753 3.521012 2.155749 2.155749 2.999821 2.155749 3.110394



Colleen adding in WGCNA

Mean expression across samples

##  Flagging genes and samples with too many missing values...
##   ..step 1
## [1] TRUE

Dendrogram with sample traits

One sample (19611) was flagged as an outlier and removed.

## pickSoftThreshold: will use block size 3494.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 3494 of 12801
##    ..working on genes 3495 through 6988 of 12801
##    ..working on genes 6989 through 10482 of 12801
##    ..working on genes 10483 through 12801 of 12801
##    Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1      1  0.05780 17.30          0.981 6420.00   6420.00 6690.0
## 2      2  0.00104  1.14          0.982 3380.00   3380.00 3690.0
## 3      3  0.14900 -8.03          0.955 1860.00   1850.00 2190.0
## 4      4  0.36200 -8.11          0.945 1060.00   1050.00 1370.0
## 5      5  0.53800 -6.96          0.929  627.00    616.00  908.0
## 6      6  0.73900 -6.88          0.958  381.00    371.00  648.0
## 7      7  0.82800 -6.15          0.968  238.00    228.00  480.0
## 8      8  0.88400 -5.47          0.976  152.00    144.00  366.0
## 9      9  0.91600 -5.01          0.983   99.60     92.70  287.0
## 10    10  0.93900 -4.60          0.987   66.50     60.80  230.0
## 11    12  0.96300 -3.96          0.993   31.30     27.40  154.0
## 12    14  0.96200 -3.57          0.989   15.70     13.00  109.0
## 13    16  0.97000 -3.20          0.992    8.37      6.51   80.0
## 14    18  0.95800 -2.99          0.986    4.68      3.39   60.4
## 15    20  0.96100 -2.77          0.991    2.74      1.84   46.7

## quartz_off_screen 
##                 2

##  mergeCloseModules: Merging modules whose distance is less than 0.5
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 51 module eigengenes in given set.
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 21 module eigengenes in given set.
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 16 module eigengenes in given set.
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 14 module eigengenes in given set.
##    Calculating new MEs...
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 14 module eigengenes in given set.
FALSE quartz_off_screen 
FALSE                 2

Figure: Cluster Dendrogram

Heatmap and gene bar graph



WGCNA correlation plots

Figure SXX. All identified WGCNA modules correlated against significant traits with R2 and p values.



WGCNA heatmaps

Figure SXX. Heatmap and barplots of samples for all significant traits identified within each WGCNA module.



Session Information

Session information from the last full knit of Rmarkdown on 28 July 2022.

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] cowplot_1.1.1               DEGreport_1.33.1           
##  [3] pheatmap_1.0.12             RColorBrewer_1.1-3         
##  [5] gplots_3.1.3                gridExtra_2.3              
##  [7] WGCNA_1.71                  fastcluster_1.2.3          
##  [9] dynamicTreeCut_1.63-1       flashClust_1.01-2          
## [11] ggvenn_0.1.9                adegenet_2.1.7             
## [13] ade4_1.7-19                 ggrepel_0.9.1              
## [15] pdftools_3.3.0              ggpubr_0.4.0               
## [17] data.table_1.14.2           vegan_2.6-2                
## [19] lattice_0.20-45             permute_0.9-7              
## [21] DESeq2_1.36.0               SummarizedExperiment_1.26.1
## [23] Biobase_2.56.0              MatrixGenerics_1.8.1       
## [25] matrixStats_0.62.0          GenomicRanges_1.48.0       
## [27] GenomeInfoDb_1.32.2         IRanges_2.30.0             
## [29] S4Vectors_0.34.0            BiocGenerics_0.42.0        
## [31] plotly_4.10.0               forcats_0.5.1              
## [33] stringr_1.4.0               dplyr_1.0.9                
## [35] purrr_0.3.4                 readr_2.1.2                
## [37] tidyr_1.2.0                 tibble_3.1.7               
## [39] ggplot2_3.3.6               tidyverse_1.3.1            
## [41] knitr_1.39                 
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                  tidyselect_1.1.2           
##   [3] RSQLite_2.2.14              AnnotationDbi_1.58.0       
##   [5] htmlwidgets_1.5.4           BiocParallel_1.30.3        
##   [7] munsell_0.5.0               codetools_0.2-18           
##   [9] ragg_1.2.2                  preprocessCore_1.58.0      
##  [11] interp_1.1-2                withr_2.5.0                
##  [13] colorspace_2.0-3            highr_0.9                  
##  [15] rstudioapi_0.13             ggsignif_0.6.3             
##  [17] labeling_0.4.2              lasso2_1.2-22              
##  [19] GenomeInfoDbData_1.2.8      mnormt_2.1.0               
##  [21] bit64_4.0.5                 farver_2.1.1               
##  [23] vctrs_0.4.1                 generics_0.1.3             
##  [25] xfun_0.31                   qpdf_1.2.0                 
##  [27] R6_2.5.1                    doParallel_1.0.17          
##  [29] clue_0.3-61                 locfit_1.5-9.6             
##  [31] reshape_0.8.9               bitops_1.0-7               
##  [33] cachem_1.0.6                DelayedArray_0.22.0        
##  [35] assertthat_0.2.1            vroom_1.5.7                
##  [37] promises_1.2.0.1            scales_1.2.0               
##  [39] nnet_7.3-17                 gtable_0.3.0               
##  [41] rlang_1.0.4                 genefilter_1.78.0          
##  [43] systemfonts_1.0.4           GlobalOptions_0.1.2        
##  [45] splines_4.2.0               rstatix_0.7.0              
##  [47] lazyeval_0.2.2              impute_1.70.0              
##  [49] broom_1.0.0                 checkmate_2.1.0            
##  [51] yaml_2.3.5                  reshape2_1.4.4             
##  [53] abind_1.4-5                 modelr_0.1.8               
##  [55] backports_1.4.1             httpuv_1.6.5               
##  [57] Hmisc_4.7-0                 tools_4.2.0                
##  [59] psych_2.2.5                 logging_0.10-108           
##  [61] ellipsis_0.3.2              jquerylib_0.1.4            
##  [63] ggdendro_0.1.23             Rcpp_1.0.9                 
##  [65] plyr_1.8.7                  base64enc_0.1-3            
##  [67] zlibbioc_1.42.0             RCurl_1.98-1.7             
##  [69] rpart_4.1.16                deldir_1.0-6               
##  [71] GetoptLong_1.0.5            haven_2.5.0                
##  [73] cluster_2.1.3               fs_1.5.2                   
##  [75] magrittr_2.0.3              circlize_0.4.15            
##  [77] reprex_2.0.1                hms_1.1.1                  
##  [79] mime_0.12                   evaluate_0.15              
##  [81] xtable_1.8-4                XML_3.99-0.10              
##  [83] jpeg_0.1-9                  readxl_1.4.0               
##  [85] shape_1.4.6                 compiler_4.2.0             
##  [87] KernSmooth_2.23-20          crayon_1.5.1               
##  [89] htmltools_0.5.2             mgcv_1.8-40                
##  [91] later_1.3.0                 tzdb_0.3.0                 
##  [93] Formula_1.2-4               geneplotter_1.74.0         
##  [95] lubridate_1.8.0             DBI_1.1.3                  
##  [97] ComplexHeatmap_2.12.0       dbplyr_2.2.1               
##  [99] MASS_7.3-57                 Matrix_1.4-1               
## [101] car_3.1-0                   cli_3.3.0                  
## [103] parallel_4.2.0              igraph_1.3.2               
## [105] pkgconfig_2.0.3             foreign_0.8-82             
## [107] xml2_1.3.3                  foreach_1.5.2              
## [109] annotate_1.74.0             bslib_0.3.1                
## [111] XVector_0.36.0              rvest_1.0.2                
## [113] digest_0.6.29               ConsensusClusterPlus_1.60.0
## [115] Biostrings_2.64.0           rmarkdown_2.14             
## [117] cellranger_1.1.0            htmlTable_2.4.1            
## [119] edgeR_3.38.1                gtools_3.9.3               
## [121] shiny_1.7.1                 rjson_0.2.21               
## [123] lifecycle_1.0.1             nlme_3.1-158               
## [125] jsonlite_1.8.0              carData_3.0-5              
## [127] seqinr_4.2-16               limma_3.52.2               
## [129] viridisLite_0.4.0           askpass_1.1                
## [131] fansi_1.0.3                 pillar_1.7.0               
## [133] KEGGREST_1.36.2             fastmap_1.1.0              
## [135] httr_1.4.3                  survival_3.3-1             
## [137] GO.db_3.15.0                glue_1.6.2                 
## [139] png_0.1-7                   iterators_1.0.14           
## [141] bit_4.0.4                   stringi_1.7.8              
## [143] sass_0.4.1                  blob_1.2.3                 
## [145] textshaping_0.3.6           caTools_1.18.2             
## [147] latticeExtra_0.6-30         memoise_2.0.1              
## [149] ape_5.6-2